In inverse reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. However, many existing IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs). We address two limitations of existing IRL techniques. First, they require an excessive amount of data due to the information asymmetry between the expert and the learner. Second, most of these IRL techniques require solving the computationally intractable forward problem -- computing an optimal policy given a reward function -- in POMDPs. The developed algorithm reduces the information asymmetry while increasing the data efficiency by incorporating task specifications expressed in temporal logic into IRL. Such specifications may be interpreted as side information available to the learner a priori in addition to the demonstrations. Further, the algorithm avoids a common source of algorithmic complexity by building on causal entropy as the measure of the likelihood of the demonstrations as opposed to entropy. Nevertheless, the resulting problem is nonconvex due to the so-called forward problem. We solve the intrinsic nonconvexity of the forward problem in a scalable manner through a sequential linear programming scheme that guarantees to converge to a locally optimal policy. In a series of examples, including experiments in a high-fidelity Unity simulator, we demonstrate that even with a limited amount of data and POMDPs with tens of thousands of states, our algorithm learns reward functions and policies that satisfy the task while inducing similar behavior to the expert by leveraging the provided side information.
translated by 谷歌翻译
我们研究了逆钢筋学习的问题(IRL),学习代理使用专家演示恢复奖励功能。大多数现有的IRL技术使代理商可以访问有关环境的完整信息,这使得经常不切实际的假设。我们通过在部分可观察到的马尔可夫决策过程(POMDPS)中开发IRL算法来消除此假设。该算法解决了现有技术的若干限制,这些技术不会考虑专家和学习者之间的信息不对称。首先,它采用因果熵作为专家演示的可能性,而不是在大多数现有的IRL技术中熵,避免了算法复杂性的共同来源。其次,它包含以时间逻辑表示的任务规范。除了演示之外,这些规范可以被解释为对学习者可用的侧面信息,并且可以减少信息不对称。然而,由于所谓的前向问题的内在非凸起,即计算最佳政策,在POMDPS中计算最佳政策,所得到的制剂仍然是非凸的。通过顺序凸编程来解决这种非凸起,并介绍几个扩展以以可扩展的方式解决前向问题。这种可扩展性允许计算策略,以牺牲添加的计算成本为代价也越优于无记忆策略。我们证明,即使具有严重限制的数据,算法也会了解满足任务的奖励函数和策略,并通过利用侧面信息并将内存结合到策略中来对专家引起类似的行为。
translated by 谷歌翻译
Data-driven soft sensors are extensively used in industrial and chemical processes to predict hard-to-measure process variables whose real value is difficult to track during routine operations. The regression models used by these sensors often require a large number of labeled examples, yet obtaining the label information can be very expensive given the high time and cost required by quality inspections. In this context, active learning methods can be highly beneficial as they can suggest the most informative labels to query. However, most of the active learning strategies proposed for regression focus on the offline setting. In this work, we adapt some of these approaches to the stream-based scenario and show how they can be used to select the most informative data points. We also demonstrate how to use a semi-supervised architecture based on orthogonal autoencoders to learn salient features in a lower dimensional space. The Tennessee Eastman Process is used to compare the predictive performance of the proposed approaches.
translated by 谷歌翻译
This study uses multisensory data (i.e., color and depth) to recognize human actions in the context of multimodal human-robot interaction. Here we employed the iCub robot to observe the predefined actions of the human partners by using four different tools on 20 objects. We show that the proposed multimodal ensemble learning leverages complementary characteristics of three color cameras and one depth sensor that improves, in most cases, recognition accuracy compared to the models trained with a single modality. The results indicate that the proposed models can be deployed on the iCub robot that requires multimodal action recognition, including social tasks such as partner-specific adaptation, and contextual behavior understanding, to mention a few.
translated by 谷歌翻译
The lack of standardization is a prominent issue in magnetic resonance (MR) imaging. This often causes undesired contrast variations due to differences in hardware and acquisition parameters. In recent years, MR harmonization using image synthesis with disentanglement has been proposed to compensate for the undesired contrast variations. Despite the success of existing methods, we argue that three major improvements can be made. First, most existing methods are built upon the assumption that multi-contrast MR images of the same subject share the same anatomy. This assumption is questionable since different MR contrasts are specialized to highlight different anatomical features. Second, these methods often require a fixed set of MR contrasts for training (e.g., both Tw-weighted and T2-weighted images must be available), which limits their applicability. Third, existing methods generally are sensitive to imaging artifacts. In this paper, we present a novel approach, Harmonization with Attention-based Contrast, Anatomy, and Artifact Awareness (HACA3), to address these three issues. We first propose an anatomy fusion module that enables HACA3 to respect the anatomical differences between MR contrasts. HACA3 is also robust to imaging artifacts and can be trained and applied to any set of MR contrasts. Experiments show that HACA3 achieves state-of-the-art performance under multiple image quality metrics. We also demonstrate the applicability of HACA3 on downstream tasks with diverse MR datasets acquired from 21 sites with different field strengths, scanner platforms, and acquisition protocols.
translated by 谷歌翻译
A new development in NLP is the construction of hyperbolic word embeddings. As opposed to their Euclidean counterparts, hyperbolic embeddings are represented not by vectors, but by points in hyperbolic space. This makes the most common basic scheme for constructing document representations, namely the averaging of word vectors, meaningless in the hyperbolic setting. We reinterpret the vector mean as the centroid of the points represented by the vectors, and investigate various hyperbolic centroid schemes and their effectiveness at text classification.
translated by 谷歌翻译
Facial recognition is fundamental for a wide variety of security systems operating in real-time applications. In video surveillance based face recognition, face images are typically captured over multiple frames in uncontrolled conditions; where head pose, illumination, shadowing, motion blur and focus change over the sequence. We can generalize that the three fundamental operations involved in the facial recognition tasks: face detection, face alignment and face recognition. This study presents comparative benchmark tables for the state-of-art face recognition methods by testing them with same backbone architecture in order to focus only on the face recognition solution instead of network architecture. For this purpose, we constructed a video surveillance dataset of face IDs that has high age variance, intra-class variance (face make-up, beard, etc.) with native surveillance facial imagery data for evaluation. On the other hand, this work discovers the best recognition methods for different conditions like non-masked faces, masked faces, and faces with glasses.
translated by 谷歌翻译
The global Information and Communications Technology (ICT) supply chain is a complex network consisting of all types of participants. It is often formulated as a Social Network to discuss the supply chain network's relations, properties, and development in supply chain management. Information sharing plays a crucial role in improving the efficiency of the supply chain, and datasheets are the most common data format to describe e-component commodities in the ICT supply chain because of human readability. However, with the surging number of electronic documents, it has been far beyond the capacity of human readers, and it is also challenging to process tabular data automatically because of the complex table structures and heterogeneous layouts. Table Structure Recognition (TSR) aims to represent tables with complex structures in a machine-interpretable format so that the tabular data can be processed automatically. In this paper, we formulate TSR as an object detection problem and propose to generate an intuitive representation of a complex table structure to enable structuring of the tabular data related to the commodities. To cope with border-less and small layouts, we propose a cost-sensitive loss function by considering the detection difficulty of each class. Besides, we propose a novel anchor generation method using the character of tables that columns in a table should share an identical height, and rows in a table should share the same width. We implement our proposed method based on Faster-RCNN and achieve 94.79% on mean Average Precision (AP), and consistently improve more than 1.5% AP for different benchmark models.
translated by 谷歌翻译
蜂窝网络(LTE,5G及以后)的增长急剧增长,消费者的需求很高,并且比具有先进的电信技术的其他无线网络更有希望。这些网络的主要目标是将数十亿个设备,系统和用户连接到高速数据传输,高电池容量和低延迟,以及支持广泛的新应用程序,例如虚拟现实,元评估,远程医疗,在线教育,自动驾驶汽车,高级制造等。为了实现这些目标,使用人工智能(AI)方法来实现频谱管理的新方法,以实现这些目标。本文使用基于AI的语义分割模型对光谱传感方法进行了脆弱性分析,以在具有防御性蒸馏方法的情况下识别对抗性攻击下的蜂窝网络信号。结果表明,缓解方法可以显着减少针对对抗攻击的基于AI的光谱传感模型的漏洞。
translated by 谷歌翻译
尽管深度学习对图像/视频恢复和超分辨率产生了重大影响,但到目前为止,学到的去缝隙在学术界或行业中受到了较少的关注。尽管脱位模型是已知和固定的,但它还是非常适合从合成数据监督学习的非常适合监督的。在本文中,我们提出了一个新颖的多场全帧速率Deinterallacing网络,该网络将最新的超级分辨率方法适应了DeinterLacing Task。我们的模型使用可变形的卷积残留块和自我注意力将相邻字段到参考字段(待解剖)的特征对齐。我们广泛的实验结果表明,所提出的方法在数值和感知性能方面提供了最先进的开采结果。在撰写本文时,我们的模型在https://videopersing.ai/benchmarks/deinterlacer.html中排名第一。
translated by 谷歌翻译